Using DST’s API from python

data tools

Learn how to extract data from Danmark Statistiks Statistikbanken with python

Published

February 4, 2022

Modified

December 6, 2024

I am a huge fan of Denmark Statistics. Their Statistikbanken contains a wealth of data on the Danish society, economy, and population.

Not only all these data are publicy available, but DST has for years also provided access to all their published data tables through an API, documented here. The API access makes it extremely easy to access and use data. Yet unless one has already some experience in accessing APIs, using it might be complex for an occasional student or analyst.

This notebook provides a quick guide on how to access data from DST’s Statsbanken through their API, and presents a utility class I wrote to more easily access data tables for analytical purposes.

The only explicit dependency of that utility is pandas, which is anyway an extremely widespread package.

Both notebook and class can be found at this GitHub repository.

The utility can be installed by

pip install git+https://github.com/alemartinello/dstapi.git

# Start by importing necessary packages
import requests
import pandas as pd
from IPython.display import display
from io import StringIO

from dstapi import DstApi # The helper class

DST’s API has four separate function calls to programmatically navigate around the published tables. This guide assumes that the analysist has scouted Statistikbanken already, and has identified the one or two tables from which data should be extracted. For these purposes, we only need two function calls: tableinfo and data.

data, as the name implies, is the function that allows us to get data from a specific table
tableinfo provides all the necessary metadata of a specific table. This latter function is crucial and extremely useful, as it allows to know how a table is structured, and which filters to pass to a data call.

The standard process is to begin by obtaining the necessary information from tableinfo, and then construct the call to pass to data.

This guide will proceed by for each step of the process first showing how to do it by directly using requests (and pandas), and second showing how the utility class DstApi can facilitate the process.

Step 1: Understand what a table has to offer and how it is structured

Our primary example will be DST’s table METROX1, which reports an index measuring the weekly amount of passengers travelling by metro in Copenhagen. This index was developed to measure the population’s response to the COVID pandemic. The table is small and simple, allowing for quick experimentation.

The hard way

As we know the table’s name/id we can start by accessing the API directly through the python package requests, and ask about the table’s metadata (tableinfo).

An API call is composed by a main web address, a function call, and a set of parameters. The main web address is https://api.statbank.dk/v1. The function call in this case is tableinfo. The set of necessary parameters, per documentation, is the id of the table and the format in which we’d like to receive the information. We’ll pick "metrox1" for the first (note that the table-id parameter is case-sensitive), and "JSON" for the second.

The API at DST can be called through both requests.get() and requests.post(). DST’s documentation recommends using post, because as the number and complexity of parameters grows (with some of them containing non-standard Danish characters) it’s harder to embed them in an URL. However, as the call to tableinfo is simple, below I provide examples of using both methods.

Note that the .json() method of the request.Response object serves to return the response content (which we requested in JSON format) rather than the object itself. That’s just to print out the output in the notebook.

This function returns a wealth of information. Not just the table id and description, but also the contact of the statistics responsible, and, crucially, names and values of the variables defining the table. In this case SÆSON and Tid.

The code below shows how to get the table’s metadata, and prints the beginning of the JSON file returned.

Show the code

# Directly embed parameters in the URL with response.get()
requests.get('https://api.statbank.dk/v1' + '/tableinfo' + "?id=metrox1&format=JSON").json()

# Pass a dictionary of parameters to requests.get()
params = {'id': 'metrox1', 'format': 'JSON'}
requests.get('https://api.statbank.dk/v1' + '/tableinfo', params=params).json()

# Use response.post() - note the change in the name of the parameter about the table's name
# I'm also adding here a language parameter - most tables are available in both Danish and English
params = {'table': 'metrox1', 'format': 'JSON', 'lang':'en'}
table_metadata = requests.post(
    'https://api.statbank.dk/v1' + '/tableinfo', json=params
).json()
print(str(table_metadata).replace(',', ',\n')[0:500] + '\n...')

{'id': 'METROX1',
 'text': 'Workday passenger index in the Copenhagen Metro (experimental statistics)',
 'description': 'Workday passenger index in the Copenhagen Metro (experimental statistics) by seasonal adjustment and time',
 'unit': 'Index',
 'suppressedDataValue': '0',
 'updated': '2022-06-16T08:00:00',
 'active': False,
 'contacts': [{'name': 'Peter Ottosen',
 'phone': '+4530429191',
 'mail': 'pot@dst.dk'}],
 'documentation': None,
 'footnote': {'text': 'Data are indexed against an averag
...

This wealth of information is already fantastic. In that metadata there’s pretty much anything you need to figure out if you can actually use the table, and eventually how you want to select the data (seasonally ajusted? For 2020 only?). Yet that JSON file might be tough to digest, especially for more complex tables. Those cases might require preprocessing and a different type of visualization. That’s where the DstApi helper class comes into play.

The easy way

DstApi has two methods for examining metadata.

The first one, tablesummary, summarizes the main metadata information: * The id and description of the table * The last update time * A table with the main available cuts of the data. Each row corresponds to a variable against which we can select, with examples of variable values and labels

Show the code

# Initialize the class with the target table
metro = DstApi('METROX1')

# Get the table summary
metro.tablesummary(language='en')

Table METROX1: Workday passenger index in the Copenhagen Metro (experimental statistics) by seasonal adjustment and time
Last update: 2022-06-16T08:00:00

	variable name	# values	First value	First value label	Last value	Last value label	Time variable
0	SÆSON	2	10	Seasonally adjusted	11	Non-seasonally adjusted	False
1	Tid	122	2020U01	2020U01	2022U23	2022U23	True

The second method variable_levels zooms into a specific variable and returns a dataframe for each potential variable value. For example, we could check each value of SÆSON

Show the code

metro.variable_levels('SÆSON', language='en')

	id	text
0	10	Seasonally adjusted
1	11	Non-seasonally adjusted

Now, we already knew these values for this simple table from tablesummary(), as they are only two. But for more complex tables, this method is very handy. Take for example DNVPDKR2, a table showing the circulating amount of mortgage bonds issued by Danish mortgage institutes.

If I wanted for example to extract only data about fixed interest rate, convertible bonds it would be hard to know I should be referring to the value FK in advance.

But I can use

method .tablesummary() to see which variables you can select on
method .variable_levels() to see which values are available for each variable. Here * is a wildcard that selects all available values for the variable.

Show the code

dnrk = DstApi('DNVPDKR2')
dnrk.tablesummary(language='en')

Table DNVPDKR2: Danish mortgage bonds by type of mortgage bond, original maturity, remaining maturity, coupon (nominal interest rate), currency, issuer, investor sector, covered bonds, data type and time
Last update: 2024-11-28T08:00:00

	variable name	# values	First value	First value label	Last value	Last value label	Time variable
0	TYPREAL	9	A0	All mortgage bonds	O	1.6 Other mortgage bonds	False
1	LØBETID3	7	A0	All original maturities	6	Other (unspecified)	False
2	LØBETID2	9	A0	All remaining maturities	8	Other (unspecified)	False
3	KUPON2	15	A0	All coupons	N	Other coupons	False
4	VALUTA	6	A0	All currencies	O	Other	False
5	UDSTED	10	A0	All issuers	O	Other issuers	False
6	INVSEKTOR	10	A0	All sectors	U	2. Foreign (S.2)	False
7	DAEKOBL	4	A0	All mortgage bonds	RO	RO (other mortgage bonds)	False
8	DATAT	5	N1	Stock - Nominal	B3	Value adjustments - Market value	False
9	Tid	299	1999M12	1999M12	2024M10	2024M10	True

Show the code

dnrk.variable_levels('TYPREAL', language='en')

	id	text
0	A0	All mortgage bonds
1	FK	1.1 Fixed rate convertible bonds
2	FKU	- - 1.1.1 Open for issue - fixed rate converti...
3	FKE	- - 1.1.2 No longer open for issue - fixed rat...
4	I	1.2 Indexed bonds
5	RTL	1.3 Adjustable rate bonds (RTL bonds)
6	V	1.4 Bonds with a rererence rate (without inter...
7	VR	1.5 Bonds with a reference rate (with interest...
8	O	1.6 Other mortgage bonds

Step 2: Get the data you need

The first step is essential for designing this second step. First and foremost because we need that information to design the call to data. Second, to make sure we only get out the data we need. Asking for too much data only to then having to throw half of it out locally is wasteful, and ultimately disrespectful for the resources invested into allowing anyone to fire up an API call (I mean how amazing is that?).

The hard way

As for the first step, we’ll start by doing it manually. Here I’ll rely exclusively on request.post() as recommended by DST.

To select the query parameters to pass to the data function appropriatedly one ought to have a careful look at the DATA section in the documentation. Nonetheless, hopefully the examples below will serve to clarify how to construct such calls.

The first two key parameters are, as before, the table name and the format in which we’d like to obtain the data. In the examples below I choose BULK, which has the advantage of being faster and allowing an unlimited number of data rows at export. There are some limitations with this format, such as the inability to perform simple computations (e.g. sums) on the fly. If you need these utilities, you probably don’t need this guide, so I’ll stick with BULK here.

The third crucial parameter is the selection based on the variables shown in e.g. DstApi.tablesummary(). These are mandatory: we need to specify the selection we want to do. We might however choose to include a range of possible values, or all of them, in a selection. In this case, the character * acts as a joker. So to select all values of a variable, we can use *. To select all 2020 weeks in Tid, we could use 2020*.

Below I write the parameters necessary to download the seasonally adjusted (code 10) index for all weeks in the data, and pass them to requests.post(). Finally I print the first 200 characters of the data we received back (in ;-separated format).

params = {
    'table': 'metrox1',
    'format': 'BULK',
    'variables': [
        {'code': 'SÆSON', 'values': ['10']},
        {'code': 'Tid', 'values': ['*']}
    ]
}
r = requests.post('https://api.statbank.dk/v1' + '/data', json=params)
print(r.text[:200])

SÆSON;TID;INDHOLD
Sæsonkorrigeret;2020U01;37,7
Sæsonkorrigeret;2020U08;105,0
Sæsonkorrigeret;2020U09;95,2
Sæsonkorrigeret;2020U10;93,0
Sæsonkorrigeret;2020U11;63,0
Sæsonkorrigeret;2020U12;17,9

Neat! We can then save this data to a csv file or whatever, or directly import it into pandas:

Show the code

pd.read_table(StringIO(r.text), sep=';').head()

	SÆSON	TID	INDHOLD
0	Sæsonkorrigeret	2020U01	37,7
1	Sæsonkorrigeret	2020U08	105,0
2	Sæsonkorrigeret	2020U09	95,2
3	Sæsonkorrigeret	2020U10	93,0
4	Sæsonkorrigeret	2020U11	63,0

Keep in mind that you can also specify intervals for time variables, as in the example below, where I also require the data to be exported in English.

Show the code

params = {
    'table': 'metrox1',
    'format': 'BULK',
    'lang': 'en',
    'variables': [
        {'code': 'SÆSON', 'values': ['11']},
        {'code': 'Tid', 'values': ['>2020U45<=2020U52']}
    ]
}
df = pd.read_csv(
    StringIO(
        requests.post('https://api.statbank.dk/v1' + '/data', json=params).text
    ), sep=';'
)
df.head()

	SÆSON	TID	INDHOLD
0	Non-seasonally adjusted	2020U46	56.2
1	Non-seasonally adjusted	2020U47	55.5
2	Non-seasonally adjusted	2020U48	58.3
3	Non-seasonally adjusted	2020U49	57.6
4	Non-seasonally adjusted	2020U50	46.9

The easy-er way

The code above is already quite compact, but to avoid remembering how to import the data into pandas all the time, DstApi has a method to import the data directly into pandas given a parameter dictionary. So, for example, given the params dictionary defined above, we might call directly

Show the code

metro.get_data(params=params)

	SÆSON	TID	INDHOLD
0	Sæsonkorrigeret	2020U01	37.7
1	Sæsonkorrigeret	2020U08	105.0
2	Sæsonkorrigeret	2020U09	95.2
3	Sæsonkorrigeret	2020U10	93.0
4	Sæsonkorrigeret	2020U11	63.0
...	...	...	...
117	Sæsonkorrigeret	2022U19	99.5
118	Sæsonkorrigeret	2022U20	95.7
119	Sæsonkorrigeret	2022U21	103.1
120	Sæsonkorrigeret	2022U22	108.4
121	Sæsonkorrigeret	2022U23	109.3

122 rows × 3 columns

the .get_data() method has also the built-in option of downloading an entire data table by not passing any parameter dictionary. As mentioned above, this might be (very) wasteful. Some DST tables contain billions of data points. That’s why when used in this way the method asks for explicit confirmation before proceeding.

However, creating the params dictionary itself can be challenging. As we have seen above with table DNVPDKR2, table structures can be complex, and creating the parameter dictionary manually can be cumbersome.

That’s why DstApi has a helper method returning a base dictionary of parameters with default values.

Only pass a default params to .get_data() if you know what you are doing

Some tables in Statistikbanken have millions of records. Downloading them all through the api can take a lot of time, and it’s extremely wasteful if in fact you only need a fraction of the data.

Show the code

# Start by constructing a basic dictionary
dnrk = DstApi('DNVPDKR2')
params = dnrk.define_base_params(language = 'en')
params

{'table': 'dnvpdkr2',
 'format': 'BULK',
 'lang': 'en',
 'variables': [{'code': 'TYPREAL', 'values': ['*']},
  {'code': 'LØBETID3', 'values': ['*']},
  {'code': 'LØBETID2', 'values': ['*']},
  {'code': 'KUPON2', 'values': ['*']},
  {'code': 'VALUTA', 'values': ['*']},
  {'code': 'UDSTED', 'values': ['*']},
  {'code': 'INVSEKTOR', 'values': ['*']},
  {'code': 'DAEKOBL', 'values': ['*']},
  {'code': 'DATAT', 'values': ['*']},
  {'code': 'Tid', 'values': ['*']}]}

Once I have the basic structure, I can copy-paste the dictionary definition and use the method variable_levels to specify the data selection further. For example, I would like to have only bonds issued by Realkredit Danmark, so the code below tells me to use value RD for variable DAEKOBL.

Show the code

dnrk.variable_levels('LØBETID3', language='en')

	id	text
0	A0	Alle oprindelige løbetider
1	1	Under 10 år
2	2	10-årige
3	3	15-årige
4	4	20-årige
5	5	30-årige
6	6	Andet (uspecificeret)

I can further refine my query filling in the selection parameters required and call the get_data() method to extract the final dataframe.

Show the code

params = {'table': 'dnvpdkr2',
 'format': 'BULK',
 'lang': 'en',
 'variables': [{'code': 'TYPREAL', 'values': ['FK']},
  {'code': 'LØBETID3', 'values': ['5']},
  {'code': 'LØBETID2', 'values': ['A0']},
  {'code': 'KUPON2', 'values': ['A0']},
  {'code': 'VALUTA', 'values': ['DKK']},
  {'code': 'UDSTED', 'values': ['RD']},
  {'code': 'INVSEKTOR', 'values': ['A0']},
  {'code': 'DAEKOBL', 'values': ['A0']},
  {'code': 'DATAT', 'values': ['N1']},
  {'code': 'Tid', 'values': ['*']}]}
df = dnrk.get_data(params=params, language='en')
df.tail()[["TID", "INDHOLD"]]

	TID	INDHOLD
294	2024M06	229104
295	2024M07	228410
296	2024M08	229411
297	2024M09	231149
298	2024M10	231694

And just like that, I have the full time series of RD’s 30yo fixed interest rate bonds in nominal values.

You can play around with parameters in various ways. For example, here I select a range of weeks in 2020 from the metro table.

Show the code

# Start by constructing a basic dictionary
params = metro._define_base_params(language = 'en')
params['variables'][0]['values'] = ['10']
params['variables'][1]['values'] = ['>2020U45<=2020U52']
metro.get_data(params=params)

	SÆSON	TID	INDHOLD
0	Seasonally adjusted	2020U46	59.0
1	Seasonally adjusted	2020U47	54.0
2	Seasonally adjusted	2020U48	56.2
3	Seasonally adjusted	2020U49	54.5
4	Seasonally adjusted	2020U50	44.3
5	Seasonally adjusted	2020U51	40.2
6	Seasonally adjusted	2020U52	44.8

And that’s it!

I hope this guide was useful, and that the DstApi class can prove as helpful to you as it is for me.

Once again, let me conclude with a shout out to Denmark Statistics, a real national treasure. Thanks for all your work in gathering, organizing, and publishing data for everyone to use. It’s a fantastic service, and one for which you’ll never be thanked enough.